Non-Deterministic Segmentation for Chinese Lattice Parsing
نویسندگان
چکیده
Parsing Chinese critically depends on correct word segmentation for the parser since incorrect segmentation inevitably causes incorrect parses. We investigate a pipeline approach to segmentation and parsing using word lattices as parser input. We compare CRF-based and lexicon-based approaches to word segmentation. Our results show that the lattice parser is capable of selecting the correction segmentation from thousands of options, thus drastically reducing the number of unparsed sentence. Lexicon-based parsing models have a better coverage than the CRFbased approach, but the many options are more difficult to handle. We reach our best result by using a lexicon from the nbest CRF analyses, combined with highly probable words.
منابع مشابه
Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging
In this paper, we describe a new reranking strategy named word lattice reranking, for the task of joint Chinese word segmentation and part-of-speech (POS) tagging. As a derivation of the forest reranking for parsing (Huang, 2008), this strategy reranks on the pruned word lattice, which potentially contains much more candidates while using less storage, compared with the traditional n-best list ...
متن کاملA Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
For the cascaded task of Chinese word segmentation, POS tagging and parsing, the pipeline approach suffers from error propagation while the joint learning approach suffers from inefficient decoding due to the large combined search space. In this paper, we present a novel lattice-based framework in which a Chinese sentence is first segmented into a word lattice, and then a lattice-based POS tagg...
متن کاملAn Efficient Chinese Parsing Algorithm for Computer-Assisted Language Learning
Instructional grammar is often used in Computer-assisted Language Learning (CALL) and the grammatical error detection is an important feature. However, it is not an easy task in Chinese language. There is no delimiter separating consecutive words in Chinese sentences. Word segmentation is a process in which proper word boundaries are identified. Before syntactic parsing of a Chinese sentence, w...
متن کاملChinese Word Segmentation in MSR-NLP
Word segmentation in MSR-NLP is an integral part of a sentence analyzer which includes basic segmentation, derivational morphology, named entity recognition, new word identification, word lattice pruning and parsing. The final segmentation is produced from the leaves of parse trees. The output can be customized to meet different segmentation standards through the value combinations of a set of ...
متن کاملUngreedy Methods for Chinese Deterministic Dependency Parsing
Deterministic dependency parsing has often been regarded as an efficient algorithm while its parsing accuracy is a little lower than the best results reported by more complex methods. In this paper, we compare deterministic dependency parsers with complex parsing methods such as generative and discriminative parsers on the standard data set of Penn Chinese Treebank. The results show that, for C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017